AITopics

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-11-2026, 18:45:35 GMT

An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers (Appendix)

Anonymous Submission

Proceedings of the International Conference on Machine Learning 2021

compute, dataset, main paper, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

arXiv.org Artificial IntelligenceDec-5-2025

Can machines perform a qualitative data analysis? Reading the debate with Alan Turing

De Paoli, Stefano

This paper reflects on the literature that rejects the use of Large Language Models (LLMs) in qualitative data analysis. It illustrates through empirical evidence as well as critical reflections why the current critical debate is focusing on the wrong problems . The paper proposes that the focus of researching the use of the LLMs for qualitative analysis is not the method per se, but rather the empirical investigation of an artificial system performing an analysis . The paper bui lds on the seminal work of Alan Turing and reads the current debate using key ideas from Turing's "Computing Machinery and Intelligence". Th is paper therefore reframes the debate on qualitative analysis with LLMs and states that ra ther than asking whether machines can perform qualitative analysis in principle, we should ask whether with LLMs we can produce analyses that are sufficiently comparable to human analysts. In the final part the contrary views to performing qualitative analysis with LLMs are analysed using the same writing and rhetorical style that Turing used in his seminal work, to discuss the contrary views to the main question.

large language model, machine learning, natural language, (19 more...)

2512.04121

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Neural Information Processing SystemsAug-18-2025, 14:25:43 GMT

An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers (Appendix)

Anonymous Submission

See table 1 for the results. We next perform regression in the Joint setting (Sec.5.3, main paper) where we fit a regression model across all environments, with 5 features instead of 2 reported in the main We find that it is possible to get an Spearman's We considered a set of 40 metrics overall and report only a small subset of them in the main paper. In table 2 we provide detailed results of all the measures we study. Figure 1 provides details of the canonicalization performed on each of the measures as explained in the main paper. In particular, (Ben-David et al., 2007) prove We also develop measures based on follow-up theoretical work in (Ben-David et al., 2010) on divergence measures using the symmetric difference hypothesis space. Here we summarize a result from (Ben-David et al., 2010), This canonicalization is used to report the results in Sec. 5 H: Z P (Y), we follow the steps in algorithm 1. Algorithm 1 Computing H -divergence measure As explained in the main paper, this divergence measure was proposed in (Ben-David et al., 2010).

artificial intelligence, dataset, machine learning, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Giachino, Gioele, Rondina, Marco, Vetrò, Antonio, Coppola, Riccardo, De Martin, Juan Carlos

An Empirical Investigation of Gender Stereotype Representation in Large Language Models: The Italian Case

arXiv.org Artificial IntelligenceJul-28-2025

The increasing use of Large Language Models (LLMs) in a large variety of domains has sparked worries about how easily they can perpetuate stereotypes and contribute to the generation of biased content. With a focus on gender and professional bias, this work examines in which manner LLMs shape responses to ungendered prompts, contributing to biased outputs. This analysis uses a structured experimental method, giving different prompts involving three different professional job combinations, which are also characterized by a hierarchical relationship. This study uses Italian, a language with extensive grammatical gender differences, to highlight potential limitations in current LLMs' ability to generate objective text in non-English languages. Two popular LLM-based chatbots are examined, namely OpenAI ChatGPT (gpt-4o-mini) and Google Gemini (gemini-1.5-flash). Through APIs, we collected a range of 3600 responses. The results highlight how content generated by LLMs can perpetuate stereotypes. For example, Gemini associated 100% (ChatGPT 97%) of 'she' pronouns to the 'assistant' rather than the 'manager'. The presence of bias in AI-generated text can have significant implications in many fields, such as in the workplaces or in job selections, raising ethical concerns about its use. Understanding these risks is pivotal to developing mitigation strategies and assuring that AI-based systems do not increase social inequalities, but rather contribute to more equitable outcomes. Future research directions include expanding the study to additional chatbots or languages, refining prompt engineering methods or further exploiting a larger experimental base.

large language model, machine learning, natural language, (17 more...)

2507.19156

Country:

Europe (0.93)
North America > United States (0.46)
Asia > Middle East > UAE (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Neural Information Processing SystemsMay-27-2025, 06:19:16 GMT

An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers

Recent work demonstrates that deep neural networks trained using Empirical Risk Minimization (ERM) can generalize under distribution shift, outperforming specialized training algorithms for domain generalization. The goal of this paper is to further understand this phenomenon. In particular, we study the extent to which the seminal domain adaptation theory of Ben-David et al. (2007) explains the performance of ERMs. Perhaps surprisingly, we find that this theory does not provide a tight explanation of the out-of-domain generalization observed across a large number of ERM models trained on three popular domain generalization datasets. This motivates us to investigate other possible measures--that, however, lack theory--which could explain generalization in this setting.

artificial intelligence, generalization, machine learning, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)

Neural Information Processing SystemsJan-19-2025, 12:11:18 GMT

An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers

empirical investigation, empirical risk minimizer, generalization, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)

Delaunay, Pierre, Bouthillier, Xavier, Breuleux, Olivier, Ortiz-Gagné, Satya, Bilaniuk, Olexa, Normandin, Fabrice, Bergeron, Arnaud, Carrez, Bruno, Alain, Guillaume, Blanc, Soline, Osterrath, Frédéric, Viviano, Joseph, Patil, Roger Creus-Castanyer Darshan, Awal, Rabiul, Zhang, Le

Introducing Milabench: Benchmarking Accelerators for AI

arXiv.org Artificial IntelligenceNov-22-2024

AI workloads, particularly those driven by deep learning, are introducing novel usage patterns to high-performance computing (HPC) systems that are not comprehensively captured by standard HPC benchmarks. As one of the largest academic research centers dedicated to deep learning, Mila identified the need to develop a custom benchmarking suite to address the diverse requirements of its community, which consists of over 1,000 researchers. This report introduces Milabench, the resulting benchmarking suite. Its design was informed by an extensive literature review encompassing 867 papers, as well as surveys conducted with Mila researchers. This rigorous process led to the selection of 26 primary benchmarks tailored for procurement evaluations, alongside 16 optional benchmarks for in-depth analysis. We detail the design methodology, the structure of the benchmarking suite, and provide performance evaluations using GPUs from NVIDIA, AMD, and Intel. The Milabench suite is open source and can be accessed at github.com/milaiqia/milabench.

benchmark, machine learning, natural language, (21 more...)

2411.1194

Country: North America > Canada > Quebec (0.04)

Genre: Overview (0.89)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Vision (0.94)

Sultan, Md Arafat, Trivedi, Aashka, Awasthy, Parul, Sil, Avirup

An Empirical Investigation into the Effect of Parameter Choices in Knowledge Distillation

arXiv.org Artificial IntelligenceJan-11-2024

We present a large-scale empirical study of how choices of configuration parameters affect performance in knowledge distillation (KD). An example of such a KD parameter is the measure of distance between the predictions of the teacher and the student, common choices for which include the mean squared error (MSE) and the KL-divergence. Although scattered efforts have been made to understand the differences between such options, the KD literature still lacks a systematic study on their general effect on student performance. We take an empirical approach to this question in this paper, seeking to find out the extent to which such choices influence student performance across 13 datasets from 4 NLP tasks and 3 student sizes. We quantify the cost of making sub-optimal choices and identify a single configuration that performs well across the board.

configuration, training example, validation example, (14 more...)

2401.06356

Genre: Research Report > New Finding (1.00)

Industry: Education > Assessment & Standards > Student Performance (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Ding, Kewen, Vamplew, Peter, Foale, Cameron, Dazeley, Richard

An Empirical Investigation of Value-Based Multi-objective Reinforcement Learning for Stochastic Environments

arXiv.org Artificial IntelligenceJan-6-2024

One common approach to solve multi-objective reinforcement learning (MORL) problems is to extend conventional Q-learning by using vector Q-values in combination with a utility function. However issues can arise with this approach in the context of stochastic environments, particularly when optimising for the Scalarised Expected Reward (SER) criterion. This paper extends prior research, providing a detailed examination of the factors influencing the frequency with which value-based MORL Q-learning algorithms learn the SER-optimal policy for an environment with stochastic state transitions. We empirically examine several variations of the core multi-objective Q-learning algorithm as well as reward engineering approaches, and demonstrate the limitations of these methods. In particular, we highlight the critical impact of the noisy Q-value estimates issue on the stability and convergence of these algorithms.

agent, algorithm, learning rate, (13 more...)

2401.03163

Country:

Oceania > Australia (0.04)
North America > United States > Oklahoma > Payne County > Cushing (0.04)
North America > United States > Colorado (0.04)
North America > United States > Arizona (0.04)

Genre: Research Report (1.00)

Industry:

Media > Television (0.67)
Leisure & Entertainment (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)